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Adsorption of organic compounds on carbon nanotubes (CNTs), governed by interactions between 
molecules and CNTs surfaces, is critical for their fate, transport, bioavailability and toxicity in the 
environment. Here, we report a promising concentration-dependent polyparameter linear free energy 
relationships (pp-LFERs) model to describe the compound-CNTs interactions and to predict sorption 
behavior of chemicals on CNTs in a wide range of concentrations (over five orders of magnitude). The 
developed pp-LFERs are able to capture the dependence of the ki on equilibrium concentration. The 
pp-LFERs indexes [r, p, a, b, v] representing different interactions are found to have a good relationship with 
the aqueous equilibrium concentrations of compounds. This modified model can successfully interpret the 
relative contribution of each interaction at a given concentration and reliably predict sorption of various 
chemicals on CNTs. This approach is expected to help develop a better environmental fate and risk 
assessment model. 



Carbon nanotubes (CNTs) have attracted enormous interest since 1991 \ Due to their unique properties, 
they have promising applications in medical, material and environmental sciences. They can be used as 
superior sorbents to treat wastewater or act as a medicine carrier in drug delivery systems. These applica- 
tions require knowledge on adsorption of organic compounds onto CNTs, which can help us better understand 
the environmental and health impacts of both CNTs and chemicals 2 . Developing rapid methods for predicting the 
sorption behavior of these compounds on CNTs is therefore important and urgently needed. 

The pp-LFERs approach has been gaining more and more acceptance and application in the context of 
environmental chemistry and contaminant fate modeling 3 " 11 . It explicitly describes the contributions toward 
free energy change from multiple kinds of molecular interactions with both water and bulk phases of sorbent 3 . 
Recently, Xia et al. successfully applied this approach in predicting the adsorption of various chemicals onto 
CNTs, and 12 other nanomaterials at a low concentration 9 . Their approach is based on the fundamental forces of 
molecular interactions, and can be expressed as follows: 



log k i = c + rR i + p7T i + aai + bP i + vVii= 1,2,3, . . . , n 



(i) 



where k t is the adsorbent- water distribution coefficient, n is the number of probe compounds, [R i5 n b a b p i5 Vi] are 
the molecular descriptors of the ith probe compound. R { is the excess molar refraction, Ui is the polarity/polariz- 
ability parameter, and pi are the hydrogen-bond acidity and basicity respectively, and Vi is the McGowan 
characteristic volume. The regression coefficients [r, p, a, b, v] are defined as nanodescriptors that indicate the 
differential compound-CNTs interactions. The c is the regression constant. 

However, up to now, most studies on pp-LFERs have been successfully applied only in a narrow and low 
concentration range of solute or in a situation that k t value did not varied significantly with solute concentrations 
(usually requires linear sorption isotherm). Those applications eliminate the well-known concentration effects of 
k i5 which is in fact relevant for any sorbate that is capable of displacing water on naturally occurring adsorption 
sites. In real environments, organic compounds distribute in a wide range of concentrations and their sorption 
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isotherms are often (if not always) found to be nonlinear. Xia et al. 
also suggested a concentration- dependency of pp-LFERs parameters 
for organic compound sorption to nanomaterials 9 , but the goal of 
their work was to characterize surface adsorption properties as a 
biologically relevant characterization metric to classify nanomaterial 
surfaces across nanoparticle types which specifically requires the 
lowest concentration to eliminate nonlinearity. Hence there is a great 
need to take concentration effect into consideration. 

In this work, a set of 16 compounds (aqueous solubility C s ranging 
from 0.135 to 80190 mg/L, R 4 from 0.805 to 2.06, ^ from 0.84 to 1.93, 
oti from 0 to 0.82, ft from 0.2 to 0.41, Vi from 0.775 to 1.585) for a 
multi-walled CNT with 8-15 nm outer diameter (MWCNT15) and 
10 compounds (C s ranging from 31.7 to 26300 mg/L, R { from 0.871 
to 1.43, 7ti from 0.92 to 2.42, a { from 0 to 0.82, ft from 0.1 to 0.47, Vi 
from 0.891 to 1.239) for a pristine MWCNT (P-MWCNTs), a gra- 
phitized MWCNT (G-MWCNTs), a carboxylated MWCNT (COOH- 
MWCNTs) and a hydroxylated MWCNT (OH-MWCNTs) were used 
to develop a new model (Supplementary Table SI). This model com- 
bines pp-LFERs parameters with aqueous equilibrium concentrations 
of compounds in order to identify and quantify the significant factors 
that govern the adsorption properties of CNTs in a wide range of 
concentrations. The selected compounds have diverse physico- 
chemical properties and have been widely used as probing compounds 
for pp-LFER modeling. Xia et al. used 28 compounds (Ri ranging from 
0.604 to 1.36, 7ti from 0.5 to 1.15, from 0 to 0.7, ft from 0.07 to 0.66, 
Vi from 0.775 to 1.324) to predict compounds' sorption on 
MWCNTs 9 . Our probing compound sets have comparable parameter 
[R i5 n b a i5 ft, Vi] value range with theirs. The number of compounds, 
from the quantitative structure-activity relationship (QSAR) view- 
point, may be a bit low, but does allow for the construction of a 
predictive model. Classic QSAR based on similar or even smaller sets 
of compounds has been successfully established and published 
elsewhere 12 " 14 . 

Results 

Supplementary Table S2 shows the correlation coefficient (R 2 ), Root 
Mean Square Error of Calibration (RMSEC), cross-validated correla- 
tion coefficient (Q 2 cv) an d cross- validated root mean square error 
(RMSECV) values of the pp-LFERs application in a wide range of 
concentrations (from —5 to 0 of log C e /C s , where C e represents the 
equilibrium concentration and C s is the solubility of the compounds) 
for MWCNT15. The difference between R 2 and Q 2 cv did not exceed 
0.3 15 and the Q 2 cv values were greater than 0.7 16 , suggesting pp- 
LFERs can be applied well in all tested concentrations. Detailed 
description of R 2 , RMSEC, Q 2 CV and RMSECV is in Experimental 
Section. Careful analysis on pp-LFERs parameters of MWCNT 15 
with different equilibrium concentration C e was further obtained 
in Figure 1. The London dispersion (v, 1.08 — 5.38) and the dipo- 
larity/polarizability (p, 0.61 — 1.79) are two important molecular 
interactions. Hydrogen-bond basicity (b, —2.97 — —10.9) and 
hydrogen-bond acidity (a, —3.33 — —1.67) have negative values, 
which suggests that the sorbent surface has a weaker tendency to 
donate/accept protons to the probe compounds than water. These 
are consistent with Xia et al's work 9 . However, all the relative inter- 
action strengths varied with C e (The v, b and a increase with increas- 
ing C e to reach a plateau and then decrease. The p decreases with 
increasing C e .), which should be the results of the interactions among 
solute molecules, water molecules and the nanotube surface 
(Figure 2). In aqueous solution, water molecule can be either a hydro- 
gen-bonding donor or accepter, which results in the competitive 
sorption with organic solutes at hydrophilic adsorption sites 2 . 
Water molecule can also compete with organic compounds on less 
hydrophilic adsorption sites 17 . Hence, at relatively low C e , as there are 
plenty of sorption sites, chemical molecule will prefer to sorb on 
those sites with high energy 18 . With increasing C e , solute molecule 
will gradually sorb on less energetic sites and the competing ability of 



water molecule on sorption sites will be relatively greater, which 
results in increasing of a, b and v. Meanwhile, as C e increases, the 
sorption sites for solute molecule continue to decrease, which might 
then change the sorption manner of solutes molecule, from planar 
(that will occupy more sorption sites per molecule) to end by end 
(that will take fewer sorption sites per molecule) 19 . This will lead to 
the increase of v and decrease in p. As C e continues to increase, there 
are not enough sites for sorption and the intermolecular interactions 
between solute molecules may increase, then a, b, v and p would be 
reduced. Moreover, all the pp-LFERs parameters and regression 
constant c were found to have a good relationship with log C e /C s . 
Shih and Gschwend developed a concentration dependent pp-LSERs 
model based on 14 organic chemicals sorption on activated carbon 13 . 
Their model indicated linear relationships between pp-LFERs para- 
meters and log C e /C s (Equation 7 in their paper). This does not work 
for MWCNTs as shown in Figure 1, they are not linear. Also, linear 
relationship between pp-LFERs parameters and equilibrium concen- 
tration was not statistically significant via polynomial regression for 
MWCNTs (Supplementary information Section 2.2). In our study, a 
scaling factor C s was introduced. Good quadratic polynomial regres- 
sions were obtained as suggested by P values of Normality and 
Constant Variance testing (>0.05) and the incremental P value 
(<0.001) (Supplementary Table S3). 

A new model is therefore obtained as follows: 

log ki =/i + / 2 Ri +M + / 4 ai +/ 5 ft + f 6 Vi i = 1,2,3, n (2) 

/i = 0.528(log(C e /C s )) 2 + 1.23log(C e /C s ) - 4.25; 
/ 2 = 0.0204(log(C e /C s )) 2 + 0.265log(C e /C s ) + 0.229; 
f 3 = 0.0644(log(C e /C s )) 2 + 0.118log(C e /C s ) + 0.849; 
f 4 = -0.207(log(C e /C s )) 2 - 0.934log(C e /C s ) - 2.84; 
fs = -0.589(log(C e /C s )) 2 - 1.58log(C e /C s ) - 4.04; 
/ 6 = -0.450 (log(Ce/Cs)) 2 - 1.74log(C e /C s ) + 3.85 

where f u / 2 , /s, and f 6 are the quadratic functions of log C e /C s . 
The / 2 , /3, /4, / 5 , and f 6 describe the relative contribution of inter- 
action strength from molecular force of lone-pair electrons, dipolar- 
ity/polarizability, hydrogen-bond acidity, hydrogen-bond basicity 
and hydrophobic interactions, respectively. For a given target com- 
pound, [Ri, n b oti, Pi, Vi] and C s are constant. Thus the above equation 
only contains one unknown parameter C e . According to equation 
(2), the concentration-dependent pp-LFERs model was successfully 
developed. 

The prediction capability is evaluated through an initial two step 
development: first split the data into training and validation sets and 
then validate the data via internal and external certification 4 ' 9 ' 12 
(Supplementary information Section 2.3-2.5). In order to obtain 
appropriate validation, we split the data into the training and 
external validation set (Supplementary Table S4). First, we sorted 
16 compounds based on decreasing maximum log ki value. 
Second, the data were split into three sets: pyrene and aniline which 
have the highest and the lowest log k imax value were grouped into the 
validation set V 2 to represent the compounds that are not within the 
range of the training set. The rest 14 compounds were split into two 
sets: training set (T) and the validation set (V x ). In order to ensure V 1 
set is evenly distributed within the range of log k imax value in training 
set, we utilized the following pattern of splitting: T-T-T-V r T-T-T- 
Vx-T-T-T-Vx-T-T. The R 2 (>0.967) and RMSEC (<0.201) of the 
internal validation suggest the good fit of the model. The Q 2 C v 
(>0.949) and RMSECV (<0.428) of the internal validation reveal 
the robustness of the predictive model (Supplementary Table S5). 
The R 2 (>0.938) and RMSEC (<0.38) obtained from external valid- 
ation indicate satisfactory predictivity for the external validation 
compounds (Supplementary Table S6). Figure 3 clearly shows 
the good predictivity for both training and validation sets. The 
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Figure 1 | Relative molecular interaction strengths on MWCNT15 varied with different log(C e /C s ) values, (a), regression constant (c); (b), regression 
coefficient r of the excess molar refraction (R); (c), regression coefficient p of the effective solute dipolarity and polarizability (71); (d), regression 
coefficient a of the effective solute hydrogen-bond acidity (a); (e), regression coefficient b of the effective solute hydrogen-bond basicity ((3); (f), 
regression coefficient v of the McGowan characteristic volume (V). (g), Comparison of interaction strengths at different C e . Error bars represent the 
standard errors of the regression analysis. 



applicability domain of the model was verified using a William plot 
(Figure 4). All the training and validation compounds in various 
equilibrium concentrations are within the chemical domain, suggest- 
ing that there are no outliers and the predictivity of the model is 
reliable. Using the same approach, we also built the models for P- 
MWCNTs, G-MWCNTs, COOH-MWCNTs and OH-MWCNTs. 
Good predictivity was obtained (Supplementary Figures S1-S4), 
which further validated the applicability of the new model. 



Discussion 

Direct applications of this model have been described using 
MWCNT15 as an example. First, it can predict the adsorption of 
organic molecules onto MWCNTs at any given concentration, which 
is a critical process for MWCNTs in biological and environmental 
systems. Endocrine disrupting compounds (EDCs) and pharmaceu- 
ticals are trace organic contaminants that have been detected in 
aquatic environments. They can mimic or antagonize natural 
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Figure 2 | Schematic plot of adsorption of organic compounds on multiple sites on CNT surface at various concentrations. At relatively low C e , as there 
are plenty of sorption sites, chemical molecule will prefer to sorb on those sites that have high energy. With increasing C e , solute molecule will gradually 
sorb on less energetic sites and the competing ability of water molecule on sorption sites will be relatively greater, which results in increasing of a, b and v. 
Meanwhile, as C e increases, the sorption sites for solute molecule continue to decrease, which might then change the sorption manner of solutes molecule, 
from planar (that will occupy more sorption sites per molecule) to end by end (that will take fewer sorption sites per molecule). This will lead to the 
increase of v and decrease of p. As C e continues to increase, there are not enough sites for sorption and the intermolecular interactions between 
solute molecules may increase, then a, b, v and p would be reduced. 



hormones, hinder metabolic processes, occupy hormone receptors, 
cause reproductive and development problems when consumed by 
humans and aquatic species 20,21 . The predicted logki for selected 
EDCs and pharmaceuticals was obtained by inputting solute descrip- 
tors and C e /C s value into the newly developed model (equation (2)). 
Good predictions were obtained compared to the data obtained from 
the literature (Figure 5a and Supplementary Table S7). The sorption 
of DNA-bases onto CNTs offers a remarkable set of technologically 
useful properties such as facilitation of CNT sorting, chemical sens- 
ing, and detection of DNA hybridization 22 . Based on the new 
approach, the adsorbent-water distribution coefficient ki which can 
be regarded as sorption affinity was calculated (Figure 5b). 
Nowadays, there are conflicting reports about sorption affinity of 
DNA-bases on CNTs. Xia et al. predicted the order of sorption affin- 
ity as follows: adenine > thymine > guanine > cytosine 9 . Johnson 
et al. and Gowtham et al. found the order as guanine > adenine > 
thymine > cytosine 22,23 . From Figure 5b, we can conclude that the 
sorption affinity order varied with C e , the above reported orders can 
be found at different C e ranges. 

Secondly, this model can also explain the relative contribution of 
molecular-interface interactions varied with different equilibrium 
concentrations. Detailed calculation for relative contribution of 
interactions is illustrated in Supplementary Section 4.1. Figure 5c 
shows the predicted sorption energy of Guanine on MWCNT15. 
We can conclude that the main interactions are hydrophobic and 
n- n stacking interactions, consistent with several previous stud- 
ies 22 " 24 . At low C e (<0.000753 mg/L), the contributions followed 
an order: n-n stacking interaction > hydrophobic interaction > 
lone-pair electrons interaction > hydrogen-bond acidity interaction 



> hydrogen-bond basicity interaction. While at C e > 0.000753 mg/ 
L, hydrophobic interaction became the most dominant interaction. 

Our new sorption model could open a quantitative way to estab- 
lish a more accurate environmental fate and risk assessment model. It 
is important to note that this new approach might apply to other 
materials in addition to CNTs, since the original pp-LFERs approach 
was also found to be fit for additional nanomaterials (AgP, Ti0 2 > 
ZnO, CuO, NiO, Fe 2 0 3 , Si0 2 , C 60 , nC 60 et al.) 9,25 . In addition, envir- 
onmental conditions (pH, ionic strength, temperature, dissolved 
organic matter) will greatly affect organic compound sorption on 
CNTs. For example, elevated pH generally increases ionization, solu- 
bility and hydrophilicity of ionizable organic chemicals and thus 
decreases their adsorption on CNTs 26 . At low ionic strength, an 
increase in salt concentration leads to a corresponding increase in 
attachment efficiency of CNTs 27 , which will then decrease organic 
compound sorption. Hence further studies should be addressed on 
modeling other nanomaterials and adsorbents under different envir- 
onmental conditions by using this new approach. 

Methods 

Single solute adsorption isotherm data were obtained from our previous paper based 
on sorption isotherms 17 ' 28,29 . For each compound, we used 24 different concentrations 
sorption data in order to compensate the mathematical identifiability issue caused by 
adding C e into the model. The adsorbent-water distribution coefficient lq value (if not 
provided) was calculated according to the following equation: 

ki = Q/C e (3) 

where Q (mg g" 1 ) is the equilibrium sorbed concentration; C e (mg L" 1 ) is equilibrium 
solution phase concentration. The physical/chemical properties including solubility 
(C s ) and the solute descriptors [Ri, n b oq, (3 i} VJ were obtained from corresponding 
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Figure 3 | Sorption isotherms of organic compounds on MWCNT15. Open squares ( □) obtained from original reference; Dash lines (— ) predicted 
based on the modified pp-LFERs model. 
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references and the Absolv program in the ADME Suite software (Advanced 
Chemistry Development), separately (Table SI). 

The correlation coefficient R 2 and the root mean square error of calibration 
RMSEC were used as the two measures of the goodness of fit of model. They can be 
expressed as follows: 



R z = l- 



^ Y exp — Y? red ^ 2 

jr (yf xp -7 ex p) 2 



(4) 



RMSEC = 



J2 ( Y; XP — Y PrCd \ 2 



(5) 



where: Yf xp is the experimental value for the zth sample; Y t pred represents predicted 
value for the z'th sample; n is the number of samples. 

The robustness of model was studied by internal cross validation using the CV- 
LOO (Cross- Validation Leave-One-Out technique). According to the CV LOO 
algorithm each compound from the data was removed, one at a time. Thus, n reduced 
models were calculated; each of these models was developed with the remaining n-l 



compounds and used to predict the sorption coefficient of the removed compound. 
The cross-validated correlation coefficient Q 2 cv and cross -validated root mean 
square error RMSECV of prediction were calculated from equations below: 



•^2 f y exp _ Y? redcv \ 



E(if 



(6) 



- y ex P ) 



RMSECV = 



\ 



_ ypredcv\ 



(7) 



The multiple linear regression analysis was conducted by using SPSS 18.0. Polynomial 
regressions between pp-LFER parameters and log C e /C s were run by Sigmaplot 1 1.0. 
Incremental Order Polynomial Regression program was used. It displays the 
regression equations for each order polynomial, starting with zero order and 
increasing to the specified order. According to the user guide of Sigmaplot, Normality 
and Constant Variance testing were employed for assumption checking for poly- 
nomial Regressions to ensure that: 1 . The source population is normally distributed 
about the regression; and 2. The variance of the dependent variable in the source 
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Figure 5 | Application of modified pp-LFERs model, (a), Plot of predicted data of logKi (logK ipred ) vs experimentally measured data (logK iexp ) of EDCs 
and pharmaceuticals sorption on MWCNT15 using the concentration dependent pp-LFERs. Dash line represents 1 : 1 line, (b), Predicted sorption affinity 
ki of four DNA bases at various equilibrium concentrations on MWCNT15, showing the often observed concentration-dependent sorption coefficients. 
O Guanine (G) □ Adenine (A) A Thymine (T) O Cytosine (C) (c), Predicted sorption energy of Guanine on MWCNT15 at different equilibrium 
concentrations. + Contributed by the excess molar refraction (R) , Contributed by the effective solute dipolarity and polarizability (71) , A Contributed 
by the effective solute hydrogen-bond acidity (a), ■ Contributed by the effective solute hydrogen-bond basicity (P), Contributed by the McGowan 
characteristic volume (V), and Total sorption energy. 
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population is constant regardless of the value of the independent variables. Both of the 
tests were set at p = 0.05 to reject. This means if p computed by the test is greater than 
0.05, the test passes. The F test statistical analysis was used to illustrate the ability of 
the independent variable in predicting the dependent variable. We used the incre- 
mental F value (F incre ) to gauge the increase in contribution of each added order of the 
independent variable in predicting the dependent variable. It is the ratio of regression 
variation from the dependent variable mean and residual variation about the 
regression curve. If F incre is large, we can conclude that adding the order of the 
independent variables predicts the dependent variable significantly better than pre- 
vious model. The first model that has a significant increase in the incremental F value 
is generally the best model to use. Because the R 2 value increases as the order increases, 
we also need to use the simplest model that adequately describes the data. 

The applicability domain of the developed model in various value of log C e /C s was 
verified by the leverage approach 9 using the plot of standardized residuals versus 
leverages (hat diagonals), i.e. the Williams plot. If the standardized residual of a 
compound is greater than three standard deviation units (±3a), the compound will 
be regarded as an outlier. The leverage of a compound is defined as 

h i = xJ(X T Xy 1 x i (8) 

where x { is the descriptor vector of the considered compound and X is the descriptor 
matrix derived from the training set descriptor values. The warning leverage (h*) is 
defined as h* = 3(N + l)/n, where N is the number of independent variables in the 
modified model (N = 5) and n is the number of training compounds (n = 1 1 in this 
study). If the leverage of the compound h t > h*, it suggests that the compound is very 
influential on the model. 
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